{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Atom & Residue\n",
    "Rosetta的Atom和Residue信息是被Conformation层所被记录。在Rosetta中原子和残基的处理方式和众多的分子模拟工具相类似。\n",
    "基于拓扑结构定义以及参数实例化的方式可以高效地处理每个“独立构象的”氨基酸残基。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. 拓扑结构定义AtomType和ResidueType\n",
    "AtomType和ResidueType其实可以理解为模板文件，它定义了一个原子的基本信息以及氨基酸残基的基本信息。\n",
    "- AtomType: 记录了原子的元素名、电荷、LJ半径，LK溶剂体积等一些最基本的参数，这些参数可以在Rosetta的源代码中被找到。\n",
    "- ResidueType: 记录了残基的原子组成（部分AtomType的信息），原子的成键方式，二面角信息（理想结构坐标）。在Rosetta里被params文件所记录。\n",
    "\n",
    "![title](./img/paramsfile.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. 从ResidueType生成Residue"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有了氨基酸的模板信息，我们只要对氨基酸模板中的二面角参数进行“赋值”，就可以用这个氨基酸来描述PDB中任意相同类型氨基酸的具体构象了。\n",
    "\n",
    "Rosetta处理的逻辑:\n",
    "- 根据Pose中的残基名，找到对应的ResidueType(模板)，复制这个模板;\n",
    "- 根据PDB中的坐标信息，计算得到每个4原子间的二面角，并将这些信息填入ICOOR_INTERNAL(内坐标地)信息当中;\n",
    "- 根据新的ResidueType信息，生成一个独立的Residue对象。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PyRosetta-4 2020 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python36.Release 2020.23+release.0d6f90a8cb9fa0567ca76bb71ee93bfe73340c70 2020-06-04T19:12:24] retrieved from: http://www.pyrosetta.org\n",
      "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n",
      "\u001b[0mcore.init: {0} \u001b[0mChecking for fconfig files in pwd and ./rosetta/flags\n",
      "\u001b[0mcore.init: {0} \u001b[0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python36.Release r257 2020.23+release.0d6f90a8cb9 0d6f90a8cb9fa0567ca76bb71ee93bfe73340c70 http://www.pyrosetta.org 2020-06-04T19:12:24\n",
      "\u001b[0mcore.init: {0} \u001b[0mcommand: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/envs/pyrosetta/lib/python3.6/site-packages/pyrosetta/database\n",
      "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0m'RNG device' seed mode, using '/dev/urandom', seed=-2071681993 seed_offset=0 real_seed=-2071681993 thread_index=0\n",
      "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0mRandomGenerator:init: Normal mode, seed=-2071681993 RG_type=mt19937\n",
      "\u001b[0mcore.import_pose.import_pose: {0} \u001b[0mFile './data/4R80.clean.pdb' automatically determined to be of type PDB\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  OXT on residue SER:CtermProteinFull 76\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  OXT on residue SER:CtermProteinFull 152\n"
     ]
    }
   ],
   "source": [
    "from pyrosetta import init, pose_from_pdb\n",
    "init()\n",
    "pose = pose_from_pdb('./data/4R80.clean.pdb')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Residue 1: PRO:NtermProteinFull (PRO, P):\n",
      "Base: PRO\n",
      " Properties: POLYMER PROTEIN CANONICAL_AA LOWER_TERMINUS ALIPHATIC METALBINDING ALPHA_AA L_AA\n",
      " Variant types: LOWER_TERMINUS_VARIANT\n",
      " Main-chain atoms:  N    CA   C  \n",
      " Backbone atoms:    N    CA   C    O    HA \n",
      " Side-chain atoms:  CB   CG   CD   NV  CAV  1HB  2HB  1HG  2HG  1HD  2HD  1H   2H  \n",
      "Atom Coordinates:\n",
      "   N  : 35.432, -0.708, 7.647\n",
      "   CA : 35.959, 0.478, 8.332\n",
      "   C  : 36.62, 1.469, 7.374\n",
      "   O  : 36.946, 1.11, 6.24\n",
      "   CB : 36.987, -0.1, 9.317\n",
      "   CG : 37.119, -1.563, 8.97\n",
      "   CD : 35.846, -1.957, 8.305\n",
      "   NV : 35.4257, -0.707336, 7.64284 (virtual)\n",
      "  CAV : 35.8996, 0.441347, 8.25002 (virtual)\n",
      "   HA : 35.141, 0.97715, 8.8721\n",
      "  1HB : 37.9438, 0.433685, 9.21849\n",
      "  2HB : 36.6417, 0.0476102, 10.3509\n",
      "  1HG : 37.9855, -1.72036, 8.31089\n",
      "  2HG : 37.3003, -2.15428, 9.87969\n",
      "  1HD : 36.0441, -2.75894, 7.57861\n",
      "  2HD : 35.1232, -2.2907, 9.06406\n",
      "  1H  : 35.7595, -0.700397, 6.70024\n",
      "  2H  : 34.4274, -0.649029, 7.64284\n",
      "Mirrored relative to coordinates in ResidueType: FALSE\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 通过Pose和residue的Pose编号，我们可以直接获取到Residue，并提取里面的具体信息:\n",
    "residue1 = pose.residue(1) \n",
    "residue2 = pose.residue(2) \n",
    "print(residue1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过Residue这个对象，我们可以很方便地获取到每个氨基酸的坐标残基以及部分原子级别的信息。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.获取Residue细节信息"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.1 残基名"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在Rosetta中端段和C端和处于肽链中的氨基酸残基的拓扑结构不同(原子数目不同),描述他们拓扑结构的params文件不能直接复用. \n",
    "为了高效描述和减少Params的文件数量，Rosetta提出了Patch系统。对于同一种氨基酸的不同状态，如\n",
    "形成了二硫键的半胱氨酸等都是以\"被修饰\"的状态进行处理, 因此为了区分，他们的命名也带上了补丁字样，比如上述的例子中，1号氨基酸名称为PRO:NtermProteinFull，其中NtermProteinFull就是他的\"补丁名\"。\n",
    "\n",
    "因此Rosetta中的残基名信息有好几种形式存在:\n",
    "- annotated_name: 带有Patch信息的残基名\n",
    "- name1: 单字母缩写\n",
    "- name3: 三字母缩写"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "P[PRO:NtermProteinFull] P PRO\n"
     ]
    }
   ],
   "source": [
    "# 获取残基的Rosetta残基名、单字母缩写、三字母缩写、标注名\n",
    "annotated_name = pose.residue(1).annotated_name()\n",
    "name1 = residue1.name1()\n",
    "name3 = residue1.name3()\n",
    "print(annotated_name, name1, name3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.2 残基性质\n",
    "每种氨基酸残基都有自带的一些性质/属性标签。这些字段由Params文件中的property字段记录。通过residue对象的Properties输出的行，也可以轻松获取这些信息。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Residue 1: PRO:NtermProteinFull (PRO, P):\n",
      "Base: PRO\n",
      " Properties: POLYMER PROTEIN CANONICAL_AA LOWER_TERMINUS ALIPHATIC METALBINDING ALPHA_AA L_AA\n",
      " Variant types: LOWER_TERMINUS_VARIANT\n",
      " Main-chain atoms:  N    CA   C  \n",
      " Backbone atoms:    N    CA   C    O    HA \n",
      " Side-chain atoms:  CB   CG   CD   NV  CAV  1HB  2HB  1HG  2HG  1HD  2HD  1H   2H  \n",
      "Atom Coordinates:\n",
      "   N  : 35.432, -0.708, 7.647\n",
      "   CA : 35.959, 0.478, 8.332\n",
      "   C  : 36.62, 1.469, 7.374\n",
      "   O  : 36.946, 1.11, 6.24\n",
      "   CB : 36.987, -0.1, 9.317\n",
      "   CG : 37.119, -1.563, 8.97\n",
      "   CD : 35.846, -1.957, 8.305\n",
      "   NV : 35.4257, -0.707336, 7.64284 (virtual)\n",
      "  CAV : 35.8996, 0.441347, 8.25002 (virtual)\n",
      "   HA : 35.141, 0.97715, 8.8721\n",
      "  1HB : 37.9438, 0.433685, 9.21849\n",
      "  2HB : 36.6417, 0.0476102, 10.3509\n",
      "  1HG : 37.9855, -1.72036, 8.31089\n",
      "  2HG : 37.3003, -2.15428, 9.87969\n",
      "  1HD : 36.0441, -2.75894, 7.57861\n",
      "  2HD : 35.1232, -2.2907, 9.06406\n",
      "  1H  : 35.7595, -0.700397, 6.70024\n",
      "  2H  : 34.4274, -0.649029, 7.64284\n",
      "Mirrored relative to coordinates in ResidueType: FALSE\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 注意Properties的输出信息: POLYMER PROTEIN CANONICAL_AA LOWER_TERMINUS ALIPHATIC METALBINDING ALPHA_AA L_AA\n",
    "print(residue1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "False\n",
      "False\n",
      "False\n",
      "False\n",
      "True\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "# 通过Residue Object还可以判断氨基酸化学性质,残基归类属性\n",
    "print(residue1.is_polar())\n",
    "print(residue1.is_aromatic())\n",
    "print(residue1.is_charged())\n",
    "print(residue1.is_DNA())\n",
    "print(residue1.is_protein())\n",
    "print(residue1.is_aramid())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "运行结果表明，Residue1是非极性非带电的蛋白质氨基酸。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.3 原子索引\n",
    "\n",
    "氨基酸的原子id信息也是由Params文件所定义，因此有固定的顺序, 并且从1开始编号。因此第一件需要做的事是索引原子信息:\n",
    "- 通过原子id进行索引\n",
    "- 通过原子名进行索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n"
     ]
    }
   ],
   "source": [
    "# 通过原子名获取:\n",
    "ca_id = residue1.atom_index('CA') # 残疾内atom id\n",
    "print(ca_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total atoms: 18\n",
      "1  N  \n",
      "2  CA \n",
      "3  C  \n",
      "4  O  \n",
      "5  CB \n",
      "6  CG \n",
      "7  CD \n",
      "8  NV \n",
      "9 CAV \n",
      "10  HA \n",
      "11 1HB \n",
      "12 2HB \n",
      "13 1HG \n",
      "14 2HG \n",
      "15 1HD \n",
      "16 2HD \n",
      "17 1H  \n",
      "18 2H  \n"
     ]
    }
   ],
   "source": [
    "# 获取总原子数量:\n",
    "natoms = residue1.natoms()\n",
    "print(f'total atoms: {natoms}')\n",
    "\n",
    "# 通过原子id获取原子信息:\n",
    "for atom_id in range(1, natoms+1):\n",
    "    print(atom_id, residue1.atom_name(atom_id))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.4 单个原子与坐标"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "获取原子坐标的方式有两种:\n",
    "1. 通过residue的xyz函数获取（封装，更方便）\n",
    "2. 通过原子对象获取"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      35.95900000000000      0.4780000000000000       8.332000000000001\n",
      "      35.43200000000000     -0.7080000000000000       7.647000000000000\n"
     ]
    }
   ],
   "source": [
    "# 直接获取原子的xyz坐标。\n",
    "ca_xyz = pose.residue(1).xyz(\"CA\")\n",
    "atom1_xyz = pose.residue(1).xyz(1)\n",
    "print(ca_xyz)\n",
    "print(atom1_xyz)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有了原子的id信息，就可以获取到具体的某个原子对象:\n",
    "- 通过原子名获取对象\n",
    "- 通过原子id获取对象"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 先获取原子对象，再获取坐标。\n",
    "# 1 原子名获取对象\n",
    "atom_ca = residue1.atom(\"CA\")\n",
    "\n",
    "# 2 通过原子id获取对象\n",
    "atom1 = residue1.atom(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      35.95900000000000      0.4780000000000000       8.332000000000001\n",
      "      35.43200000000000     -0.7080000000000000       7.647000000000000\n"
     ]
    }
   ],
   "source": [
    "print(atom_ca.xyz())\n",
    "print(atom1.xyz())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.5 Atomtype原子类型信息的获取\n",
    "通过atom_type方法，我们可以获取每个原子的细节的信息，如原子的Rosetta类型、原子的元素名、范德华半径、Lazaridis Karplus溶剂化参数等."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Atom Type: Nlys\n",
      "\telement: N\n",
      "\tLennard Jones: radius=1.80245 wdepth=0.161725\n",
      "\tLazaridis Karplus: lambda=3.5 volume=16.514 dgfree=-20.8646\n",
      "\tproperties: DONOR \n",
      "Extra Parameters: 1.75 1.55 0.79 1.55 1.44 1.5 1.55 -20 -10.695 -1.145 -20 -0.62 0 0 0 1.85 8.52379 0.025 0.01 0.005 -289.292 -0.697267 -1933.88 -1.56243 -93.2613 93.2593 0.00202205 715.165 74.6559 -74.6539 0.00268963 -1282.36 0.633 -0.367 0.926 -0.537 0.633 -0.367\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 获取atom所有的信息:\n",
    "atom_type1 = residue1.atom_type(1)\n",
    "print(atom_type1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Nlys\n"
     ]
    }
   ],
   "source": [
    "# 获取Rosetta的atomtype_name\n",
    "atomtype_name = atom_type1.atom_type_name()\n",
    "print(atomtype_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N\n"
     ]
    }
   ],
   "source": [
    "# 元素名\n",
    "atom_element = atom_type1.element()\n",
    "print(atom_element)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.802452 16.514 -20.864641\n"
     ]
    }
   ],
   "source": [
    "# LJ参数\n",
    "lj_radius = atom_type1.lj_radius()\n",
    "lk_volume = atom_type1.lk_volume()\n",
    "lk_dgfree = atom_type1.lk_dgfree()\n",
    "\n",
    "print(lj_radius, lk_volume, lk_dgfree)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "pyrosetta",
   "language": "python",
   "name": "pyrosetta"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
